Add image query support to the backend microservices by dmsuehir · Pull Request #12 · mhbuehler/GenAIComps

dmsuehir · 2024-12-02T23:15:39Z

Description

Updates the following microservices:

gateway: Updated to support multiple prompt formats depending on the model being used, updated send an image to the embedding service for first queries, and allow sending multiple images to the LVM.
lvm-llava: Updated to allow multiple images to be passed in and added support for other prompt formats depending on the LVM model. The lvm-llava-svc will also limit the number of images that get passed to the LVM using MAX_IMAGES
embedding: This microservice already allowed image input. Updated to pass that image through with the emedding output.
retriever: Updated to pass the original image through with the output (along with the image that was found with the similarity search)

Issues

https://github.com/opea-project/docs/blob/main/community/rfcs/24-10-02-GenAIExamples-001-Image_and_Audio_Support_in_MultimodalQnA.md

Type of change

List the type of change like below. Please delete options that are not relevant.

New feature (non-breaking change which adds new functionality)

Dependencies

No new dependencies

Tests

Tests have been updated

Signed-off-by: dmsuehir <[email protected]>

…ina/image_query

Signed-off-by: dmsuehir <[email protected]>

…ina/image_query

Signed-off-by: dmsuehir <[email protected]>

…to dina/image_query

Signed-off-by: dmsuehir <[email protected]>

Signed-off-by: okhleif-IL <[email protected]> * added in audio dict creation Signed-off-by: okhleif-IL <[email protected]> * separated audio from prompt Signed-off-by: okhleif-IL <[email protected]> * added ASR endpoint Signed-off-by: okhleif-IL <[email protected]> * removed ASR endpoints from mm embedding Signed-off-by: okhleif-IL <[email protected]> * edited return logic, fixed function call Signed-off-by: okhleif-IL <[email protected]> * added megaservice to elif Signed-off-by: okhleif-IL <[email protected]> * reworked helper func Signed-off-by: okhleif-IL <[email protected]> * Append audio to prompt Signed-off-by: okhleif-IL <[email protected]> * Reworked handle messages, added metadata Signed-off-by: okhleif-IL <[email protected]> * Moved dictionary logic to right place Signed-off-by: okhleif-IL <[email protected]> * changed logic to rely on message len Signed-off-by: okhleif-IL <[email protected]> * list --> empty str Signed-off-by: okhleif-IL <[email protected]> --------- Signed-off-by: Melanie Buehler <[email protected]> Signed-off-by: okhleif-IL <[email protected]> Signed-off-by: dmsuehir <[email protected]>

…ina/image_query Signed-off-by: dmsuehir <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: okhleif-IL <[email protected]>

Signed-off-by: dmsuehir <[email protected]>

Signed-off-by: okhleif-IL <[email protected]>

Fixed role bug where enumeration was wrong

for more information, see https://pre-commit.ci

Signed-off-by: dmsuehir <[email protected]>

…nto dina/image_query Signed-off-by: dmsuehir <[email protected]>

Signed-off-by: dmsuehir <[email protected]>

Signed-off-by: Melanie Buehler <[email protected]>

Signed-off-by: dmsuehir <[email protected]>

dmsuehir · 2024-12-05T22:40:06Z

tests/cores/mega/test_multimodalqna_gateway.py

    else:
        print("request is from user.")
        text = req_dict["prompt"]
-        text = f"<image>\nUSER: {text}\nASSISTANT:"


It used to be that the LVM microservice would always add <image>\n and USER: to the beginning of the prompt that was created by the gateway, however I had to change that since images can now to scattered throughout the conversation (we can no longer just attach the single image to the first message in the conversation). I updated this mock of the LVM microservice to do something similar to what the actual LVM service does where it checks how many image tags already exist in the prompt, and adds extras if they are needed. The reason why we might need extras are when the retriever also gets an image from the vector store.

dmsuehir · 2024-12-05T22:41:58Z

tests/cores/mega/test_multimodalqna_gateway.py

        )
        # print(result_dict)
-        self.assertEqual(result_dict[self.lvm.name]["text"], "<image>\nUSER: chao, \nASSISTANT:")
+        self.assertEqual(result_dict[self.lvm.name]["text"], "USER: <image>\nchao, \nASSISTANT:")


Previously, the prompts had <image>\nUSER: but the prompt format documented by huggingface is the other way around USER: <image>\n and also since we will now have images interleaved in the conversation, I changed it to match the HF docs.

dmsuehir · 2024-12-05T22:45:05Z

tests/cores/mega/test_multimodalqna_gateway.py

+        formats than the default LLaVA 1.5 model.
+        """
+
+        # Models to test and their expected prompts


Normally I would've parameterized this test with the pytest decorator, however it seems like that won't work in this case because of them subclassing unittest.IsolatedAsyncioTestCase. Apparently there's another library called parameterized that could do it, but the unittest container that this is run in doesn't have that dependency and it's easier to avoid adding a third party dependency (stackoverflow reference for the issue).

So, instead I just have lists of the models and their expected prompt format, and I'm looping those to do the checks.

Signed-off-by: Melanie Buehler <[email protected]>

Adds unit test coverage for audio query

for more information, see https://pre-commit.ci

Signed-off-by: Melanie Buehler <[email protected]>

Fix port number placement

okhleif-10

LGTM, do all tests pass?

mhbuehler · 2024-12-06T23:16:47Z

comps/cores/mega/gateway.py

-        # Multimodal RAG QnA With Videos has not yet accepts image as input during QnA.
        num_messages = len(data["messages"]) if isinstance(data["messages"], list) else 1
+
+        # Multimodal RAG QnA With Videos has not yet accepts image as input during QnA.


Can you remove this comment or update it for accuracy?

mhbuehler · 2024-12-06T23:54:53Z

tests/cores/mega/test_multimodalqna_gateway.py

+                self.assertEqual(len(b64_types["image"]), 2)
+            finally:
+                test_gateway.stop()
+


Nice test, it passed for me

Signed-off-by: dmsuehir <[email protected]>

…nto dina/image_query

Signed-off-by: dmsuehir <[email protected]>

…mples Signed-off-by: dmsuehir <[email protected]>

dmsuehir · 2024-12-12T23:05:42Z

I rebased and synced this branch with mmqna-image-query since the audio input PR has been merged. I also moved the gateway updates to GenAIExamples, and reran the microservice tests for LVM, embddings multimodal, and retriever multimodal redis.

mhbuehler · 2024-12-13T17:31:11Z

tests/retrievers/test_retrievers_multimodal_redis_langchain.sh

+
+        if echo "$CONTENT" | grep -q "retrieved_docs"; then
+            echo "[ retriever ] Content has retrieved_docs as expected."
+            if echo "$CONTENT" | grep -q "retrieved_docs"; then


Is this grep supposed to look for img_b64_str?

Yes, good point. I will fix this.

Signed-off-by: dmsuehir <[email protected]>

mhbuehler

LGTM!

okhleif-10 · 2024-12-13T21:46:01Z

comps/lvms/llava/lvm.py

 logflag = os.getenv("LOGFLAG", False)

+# The maximum number of images that should be sent to the LVM
+max_images = int(os.getenv("MAX_IMAGES", 1))


In this line, is 1 being set manually? Or is it a default?

This means that if MAX_IMAGES is unset, it will default to 1

dmsuehir added 9 commits November 22, 2024 15:00

Backend enhancements for image query capabilities for MultimodalQnA

51651aa

Fix model name var

f83e2e1

Signed-off-by: dmsuehir <[email protected]>

Merge branch 'mmqna-phase2' of github.com:mhbuehler/GenAIComps into d…

1a61cb5

…ina/image_query

Remove space at end of prompt

1f0dfcd

Signed-off-by: dmsuehir <[email protected]>

Merge branch 'mmqna-phase2' of github.com:mhbuehler/GenAIComps into d…

107680d

…ina/image_query

Add env var for the max number of images sent to the LVM

5b51771

Signed-off-by: dmsuehir <[email protected]>

README update for the MAX_IMAGES env var

242ee6f

Signed-off-by: dmsuehir <[email protected]>

Merge branch 'dina/image_query' of github.com:mhbuehler/GenAIComps in…

8b21819

…to dina/image_query

Remove prints

5b41724

Signed-off-by: dmsuehir <[email protected]>

dmsuehir mentioned this pull request Dec 2, 2024

MultimodalQnA updates to add support for image queries mhbuehler/GenAIExamples#23

Merged

1 task

okhleif-10 and others added 15 commits December 2, 2024 15:39

Merge branch 'mmqna-phase2' of github.com:mhbuehler/GenAIComps into d…

f4a7199

…ina/image_query Signed-off-by: dmsuehir <[email protected]>

Merge branch 'main' into mmqna-audio-query

e1e5fde

[pre-commit.ci] auto fixes from pre-commit.com hooks

70c54e1

for more information, see https://pre-commit.ci

fixed role bug where i never was > 0

6a71843

Signed-off-by: okhleif-IL <[email protected]>

Fix after merge

411bfdf

Signed-off-by: dmsuehir <[email protected]>

removed whitespace

615459b

Signed-off-by: okhleif-IL <[email protected]>

Merge pull request #13 from mhbuehler/omar/role-debug

1753473

Fixed role bug where enumeration was wrong

[pre-commit.ci] auto fixes from pre-commit.com hooks

dcafe8d

for more information, see https://pre-commit.ci

Fix call to get role labels

e32bef4

Signed-off-by: dmsuehir <[email protected]>

Merge branch 'mmqna-audio-query' of github.com:mhbuehler/GenAIComps i…

63c08fe

…nto dina/image_query Signed-off-by: dmsuehir <[email protected]>

Gateway test updates images within the conversation

db22c47

Signed-off-by: dmsuehir <[email protected]>

Adds unit test coverage for audio query

fa47959

Signed-off-by: Melanie Buehler <[email protected]>

Update test to check the returned b64 types

02efc8a

Signed-off-by: dmsuehir <[email protected]>

Update test since we don't expect images from the assistant

d74bb32

Signed-off-by: dmsuehir <[email protected]>

dmsuehir commented Dec 5, 2024

View reviewed changes

mhbuehler and others added 4 commits December 6, 2024 09:12

Port number fix

37826be

Signed-off-by: Melanie Buehler <[email protected]>

Formatting

40d34db

Signed-off-by: Melanie Buehler <[email protected]>

Merge pull request #14 from mhbuehler/melanie/add_test_coverage

6f2a753

Adds unit test coverage for audio query

[pre-commit.ci] auto fixes from pre-commit.com hooks

a665c3c

for more information, see https://pre-commit.ci

ashahba and others added 3 commits December 6, 2024 09:36

Merge branch 'main' into mmqna-audio-query

4a5c8ea

Fixed place where port number is set

d9ab567

Signed-off-by: Melanie Buehler <[email protected]>

Merge pull request #15 from mhbuehler/melanie/port_placement

75b135f

Fix port number placement

okhleif-10 reviewed Dec 6, 2024

View reviewed changes

mhbuehler requested changes Dec 6, 2024

View reviewed changes

dmsuehir added 4 commits December 9, 2024 11:41

Remove old comment and added more accurate description

9a077c5

Signed-off-by: dmsuehir <[email protected]>

add comment in code about MAX_IMAGES

b21e575

Signed-off-by: dmsuehir <[email protected]>

Add Gaudi support for image query

a3abd8a

Signed-off-by: dmsuehir <[email protected]>

Merge branch 'mmqna-audio-query' of github.com:mhbuehler/GenAIComps i…

b8dbabf

…nto dina/image_query

dmsuehir changed the base branch from mmqna-audio-query to mmqna-image-query December 12, 2024 18:32

dmsuehir added 3 commits December 12, 2024 13:02

Merge branch 'mmqna-image-query' of github.com:mhbuehler/GenAIComps i…

c87504c

…nto dina/image_query

Fix to pass the retrieved image last

723f0c3

Signed-off-by: dmsuehir <[email protected]>

Revert out gateway and gateway test code, due to its move to GenAIExa…

b1205f4

…mples Signed-off-by: dmsuehir <[email protected]>

mhbuehler reviewed Dec 13, 2024

View reviewed changes

Fix retriever test for checking for b64_img_str in the result

bac117a

Signed-off-by: dmsuehir <[email protected]>

mhbuehler approved these changes Dec 13, 2024

View reviewed changes

okhleif-10 reviewed Dec 13, 2024

View reviewed changes

dmsuehir merged commit 2eaf136 into mmqna-image-query Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add image query support to the backend microservices#12

Add image query support to the backend microservices#12
dmsuehir merged 39 commits intommqna-image-queryfrom
dina/image_query

dmsuehir commented Dec 2, 2024

Uh oh!

dmsuehir Dec 5, 2024

Uh oh!

dmsuehir Dec 5, 2024

Uh oh!

dmsuehir Dec 5, 2024

Uh oh!

okhleif-10 left a comment

Uh oh!

mhbuehler Dec 6, 2024

Uh oh!

mhbuehler Dec 6, 2024

Uh oh!

dmsuehir commented Dec 12, 2024

Uh oh!

mhbuehler Dec 13, 2024

Uh oh!

dmsuehir Dec 13, 2024

Uh oh!

mhbuehler left a comment

Uh oh!

okhleif-10 Dec 13, 2024

Uh oh!

dmsuehir Dec 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dmsuehir commented Dec 2, 2024

Description

Issues

Type of change

Dependencies

Tests

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

okhleif-10 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmsuehir commented Dec 12, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhbuehler left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants